Goto

Collaborating Authors

 relu activation


Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

Papagiannouli, Katerina, Trevisan, Dario, Zitto, Giuseppe Pio

arXiv.org Machine Learning

We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.



OntheSimilaritybetweentheLaplace andNeuralTangentKernels

Neural Information Processing Systems

Finally, we provide experiments on real data comparing NTK and the Laplace kernel, along with a larger class ofγ-exponential kernels. We show that these perform almost identically.



A Proof A.1 Proof of Theorem 1 We leverage the results in [ 49

Neural Information Processing Systems

Lemma 3. Consider the ReLU activation The proof of Theorem 1 is given below. The inequality 3 uses strictly monotone property of p () . Code is available at this link. The neural networks are updated using Adam with learning rate initializes at 0.035 and All of them have no communication constraints. The training time is shown in Table 1.


Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

Neural Information Processing Systems

Empirical results, however,suggest thatnetworks of moderate size already yield appealing performance. To explain such a gap, a common belief is that many data sets exhibit low dimensional structures, and can be modeled as samples near a low dimensional manifold.